feat(risk): cross-source identity keying — recover risk for ~10% of repos#41
Conversation
…oss-source identity)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91d05470ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| slug = str(identity.get("repo_full_name") or "").rsplit("/", 1)[-1] | ||
| if slug and slug not in lookup: | ||
| lookup[slug] = entry |
There was a problem hiding this comment.
Deduplicate risk aliases before aggregating
When a truth project has both a display name and a different repo_full_name slug, this new alias makes build_risk_lookup() return two keys for the same project. _extract_risk_posture() was updated to dedupe by object identity, but other existing consumers still aggregate by iterating the returned map directly; for example export_html_dashboard() counts every _risk_lookup.items() entry in src/web_export.py:108-111, and the Markdown reporter similarly iterates risk_lookup.values() in src/reporter.py:1684-1688. In those surfaces, a single elevated repo like Signal & Noise / signal-noise is now reported as two elevated repos and can appear twice in the elevated list.
Useful? React with 👍 / 👎.
Arc G — risk identity keying (cross-source)
Fixes the pre-existing limitation surfaced by Arc B's code review: risk is computed in portfolio truth (keyed by local-dir
display_name) but every render consumer iterates audit data (keyed by GitHubmetadata.name). For repos whose dir name ≠ GitHub repo name, risk rendered blank on every surface — including the already-shipped All Repos column.Fix (mirrors the existing
_select_security_entryGHAS join)IdentityFields.repo_full_name(additive, from_git_remote_full_name, already captured at scan time but previously dropped).build_risk_lookup(report_enrichment.py) andload_risk_truth(excel_export_truth_helpers.py), so consumers keying bymetadata.nameresolve._extract_risk_posturededups by identity andload_risk_truthincrements posture once per project — no double-counting (verified: posture sum == project count).Measured impact
On a real audit report, repos resolving risk went 86% → 96% — 12 repos recovered (signal-noise, devils-advocate, PhantomFrequencies, seismoscope, …) that were silently blank everywhere. 121/129 truth projects now carry the slug (the 8 without have no GitHub remote).
Verification